Named Entity Disambiguation for German News Articles
نویسندگان
چکیده
Named entity disambiguation has become an important research area providing the basis for improving search engine precision and for enabling semantic search. Current approaches for the named entity disambiguation are usually based on exploiting structured semantic and lingual resources (e.g. WordNet, DBpedia). Unfortunately, each of these resources cover independently from each other insufficient information for the task of named entity disambiguation. On the one hand WordNet comprises a relative small number of named entities while on the other hand DBpedia provides only little context for named entities. Our approach is based on the use of multi-lingual Wikipedia data. We show how the combination of multi-lingual resources can be used for named entity disambiguation. Based on a German and an English document corpus, we evaluate various similarity measures and algorithms for extracting data for named entity disambiguation. We show that the intelligent filtering of context data and the combination of multilingual information provides high quality named entity disambiguation results.
منابع مشابه
GerNED: A German Corpus for Named Entity Disambiguation
Determining the real-world referents for name mentions of persons, organizations and other named entities in texts has become an important task in many information retrieval scenarios and is referred to as Named Entity Disambiguation (NED). While comprehensive datasets support the development and evaluation of NED approaches for English, there are no public datasets to assess NED systems for ot...
متن کاملLarge-Scale Named Entity Disambiguation Based on Wikipedia Data
This paper presents a large-scale system for the recognition and semantic disambiguation of named entities based on information extracted from a large encyclopedic collection and Web search results. It describes in detail the disambiguation paradigm employed and the information extraction process from Wikipedia. Through a process of maximizing the agreement between the contextual information ex...
متن کاملIsaac Bloomberg Meets Michael Bloomberg: Better EntityDisambiguation for the News
This paper shows the implementation and evaluation of the Entity Linking or Named Entity Disambiguation system used and developed at Bloomberg. In particular, we present and evaluate a methodology and a system that do not require the use of Wikipedia as a knowledge base or training corpus. We present how we built features for disambiguation algorithms from the Bloomberg News corpus, and how we ...
متن کاملA Knowledge-Based Approach to Named Entity Disambiguation in News Articles
Named entity disambiguation has been one of the main challenges to research in Information Extraction and development of Semantic Web. Therefore, it has attracted much research effort, with various methods introduced for different domains, scopes, and purposes. In this paper, we propose a new approach that is not limited to some entity classes and does not require wellstructured texts. The nove...
متن کاملMultilingual Disambiguation of Named Entities Using Linked Data
One key step towards extracting structured data from unstructured data sources is the disambiguation of entities. With AGDISTIS, we provide a time-efficient, state-of-the-art, knowledge-base-agnostic and multilingual framework for the disambiguation of RDF resources. The aim of this demo is to present the English, German and Chinese version of our framework based on DBpedia. We show the results...
متن کامل